95 research outputs found

    Semi-Automatic Generation of Adaptive Codes

    Get PDF
    International audienceCompiler automatic optimization and parallelization techniques are well suited for some classes of simulation or signal processing applications, however they usually don't take into account domain-specific knowledge nor the possibility to change or to remove some computations to achieve " good enough " results. Quite differently, production simulation and signal processing codes have adaptive capabilities: they are designed to compute precise results only where it matters if the complete problem is not tractable or if computation time must be short. In this paper, we present a new way to provide adaptive capabilities to compute-intensive codes automatically. It relies on domain-specific knowledge provided through special pragmas by the programmer in the input code and on polyhedral compilation techniques to continuously regenerate at runtime a code that performs heavy computations only where it matters at every moment. We present a case study on a fluid simulation application where our strategy enables significant computation savings and speedup in the optimized portion of the application while maintaining a good precision, with a minimal effort from the programmer

    Automatic adaptive approximation for stencil computations

    Get PDF
    International audienceApproximate computing is necessary to meet deadlines in some compute-intensive applications like simulation. Building them requires a high level of expertise from the application designers as well as a significant development effort. Some application programming interfaces greatly facilitate their conception but they still heavily rely on the developer's domain-specific knowledge and require many modifications to successfully generate an approximate version of the program. In this paper we present new techniques to semi-automatically discover relevant approximate computing parameters. We believe that superior compiler-user interaction is the key to improved productivity. After pinpointing the region of interest to optimize, the developer is guided by the compiler in making the best implementation choices. Static analysis and runtime monitoring are used to infer approximation parameter values for the application. We evaluated these techniques on multiple application kernels that support approximation and show that with the help of our method, we achieve similar performance as non-assisted, hand-tuned version while requiring minimal intervention from the user

    Think Unlimited and Compress Data Automatically

    Get PDF
    National audienceDeveloping an application which, when unoptimized, consumes more memory resources than physically or financially available demands a lot of expertise. In this work, we show that with the right tools and language abstractions, writing such programs for a given class of applications can stay within reach of non-expert developers. We explore the potential of a compiler-based data layout transformation from dense array to a compressed tree data structure. This transformation allows easy application prototyping, provides compression and carries information that can be used with more advanced optimization, e.g., adaptive and approximate computing techniques. We are primarily targeting partial differential equation solvers and signal processing applications. We evaluate the compression ratio and error originating from this compressed representation. We suggest multiple exploration paths to produce an automatic adaptive code transformation with compressing capabilities from the multiresolution information produced during the transformation

    Pipelined Multithreading Generation in a Polyhedral Compiler

    Get PDF
    International audienceState-of-the-art automatic polyhedral parallelizers extract and express parallelism as isolated parallel loops. For example, the Pluto high-level compiler generates and annotates loops with "#pragma omp parallel for" directives. Our goal is to take advantage of pipelined multithreading, a parallelization strategy allowing to address a wider class of codes, currently not handled by automatic parallelizers. Pipelined multithreading requires to interlace iterations of some loops in a controlled way that enables the parallel execution of these iterations. We achieve this using OpenMP clauses such as ordered and nowait. The sketch of our method is to: (1) schedule a SCoP using traditional techniques such as Pluto's algorithm; (2) detect potential pipelines in groups of sequential loops; (3) fine-tune the schedule; and (4) generate the resulting code. The fully automatic generation is ongoing work, yet we show on a small set of experiments how pipelined multi-threading permits to parallelize programs which would otherwise not be parallelized

    Splitting Polyhedra to Generate More Efficient Code: Efficient Code Generation in the Polyhedral Model is Harder Than We Thought

    Get PDF
    International audienceCode generation in the polyhedral model takes as inputa union of Z-polyhedra and produces code scanning all ofthem. Modern code generation tools are heavily relying onpolyhedral operations to perform this task. However, theseoperations are typically provided by general-purpose poly-hedral libraries that are not specifically designed to addressthe code generation problem. In particular, (unions of) poly-hedra may be represented in various mathematically equiv-alent ways which may have different properties with respectto code generation. In this paper, we investigate this prob-lem and try to find the best representation of polyhedra togenerate efficient code.We present two contributions. First we demonstrate thatthis problem has been largely under-estimated, showing sig-nificant control overhead deviations when using differentrepresentations of the same polyhedra. Second, we proposean improvement to the main algorithm of the state-of-the-artcode generation tool CLooG. It generates code with fewertests in the inner loops, and aims to reduce control overheadand to simplify vectorization for the compiler, at the cost ofa larger code size. It is based on a smart splitting of theunion of polyhedra while recursing on the dimensions. Weimplemented our algorithm in CLooG/PolyLib, and com-pared the performance and size of the generated code to theCLooG/isl version

    Adaptive Code Refinement: A Compiler Technique and Extensions to Generate Self-Tuning Applications

    Get PDF
    International audienceCompiler high-level automatic optimization and parallelization techniques are well suited for some classes of simulation or signal processing applications, however they usually don't take into account domain-specific knowledge nor the possibility to change or to remove some computations to achieve " good enough " results. Differently, production simulation and signal processing codes have adaptive capabilities: they are designed to compute precise results only where it matters if the complete problem is not tractable or if computation time must be short. In this paper, we present a new way to provide adaptive capabilities to compute-intensive codes automatically. It relies on domain-specific knowledge provided through special pragmas by the programmer in the input code and on polyhedral compilation techniques to continuously regenerate at runtime a code that performs heavy computations only where it matters. We present experimental results on several applications where our strategy enables significant computation savings and speedup while maintaining a good precision, with a minimal effort from the programmer

    Predictive Modeling in a Polyhedral Optimization Space

    Get PDF
    International audienceHigh-level program optimizations, such as loop transformations, are critical for high performance on multi-core targets. However, complex sequences of loop transformations are often required to expose parallelism (both coarse-grain and fine-grain) and improve data locality. The polyhedral compilation framework has proved to be very effective at representing these complex sequences and restructuring compute-intensive applications, seamlessly handling perfectly and imperfectly nested loops. Nevertheless identifying the most effective loop transformations remains a major challenge. We address the problem of selecting the best polyhedral optimizations with dedicated machine learning models, trained specifically on the target machine. We show that these models can quickly select high-performance optimizations with very limited iterative search. Our end-to-end framework is validated using numerous benchmarks on two modern multi-core platforms. We investigate a variety of different machine learning algorithms and hardware counters, and we obtain performance improvements over productions compilers ranging on average from 3.2x to 8.7x, by running not more than 6 program variants from a polyhedral optimization space

    Hybrid Iterative and Model-Driven Optimization in the Polyhedral Model

    Get PDF
    On modern architectures, a missed optimization can translate into performance degradations reaching orders of magnitude. More than ever, translating Moore's law into actual performance improvements depends on the effectiveness of the compiler. Moreover, missing an optimization and putting the blame on the programmer is not a viable strategy: we must strive for portability of performance or the majority of the software industry will see no benefit in future many-core processors. As a consequence, an optimizing compiler must also be a parallelizing one; it must take care of the memory hierarchy and of (re)partitioning computation to best suit the target architecture Polyhedral compilation is a program optimization and parallelization framework capable of expressing extremely complex transformation sequences. The ability to build and traverse a tractable search space of such transformations remains challenging, and existing model-based heuristics can easily be beaten in identifying profitable parallelism/locality trade-offs. We propose a hybrid iterative and model-driven algorithm for automatic tiling, fusion, distribution and parallelization of programs in the polyhedral model. Our experiments demonstrate the effectiveness of this approach, both in obtaining solid performance improvements over existing auto-parallelizing compilers, and in achieving portability of performance on various modern multi-core architectures

    Automatic Parallelization and Locality Optimization of Beamforming Algorithms

    Get PDF
    International audienceThis paper demonstrates the benefits of a global optimization strategy using a new automatic parallelization and locality optimization methodology for high performance embedded computing algorithms that occur in adaptive radar systems, for modern multi-core computing chips. As a baseline, the resulting performance was compared against the performance that could be obtained using highly optimized math libraries

    Parametric Multi-Level Tiling of Imperfectly Nested Loops

    Get PDF
    International audienceTiling is a crucial loop transformation for generating high perfor- mance code on modern architectures. Efficient generation of multi- level tiled code is essential for maximizing data reuse in systems with deep memory hierarchies. Tiled loops with parametric tile sizes (not compile-time constants) facilitate runtime feedback and dynamic optimizations used in iterative compilation and automatic tuning. Previous parametric multi-level tiling approaches have been restricted to perfectly nested loops, where all assignment state- ments are contained inside the innermost loop of a loop nest. Pre- vious solutions to tiling for imperfect loop nests have only handled fixed tile sizes. In this paper, we present an approach to paramet- ric multi-level tiling of imperfectly nested loops. The tiling tech- nique generates loops that iterate over full rectangular tiles, making them amenable to compiler optimizations such as register tiling. Experimental results using a number of computational benchmarks demonstrate the effectiveness of the developed tiling approach
    • …
    corecore